45 research outputs found

    Circuit Design, Architecture and CAD for RRAM-based FPGAs

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have been indispensable components of embedded systems and datacenter infrastructures. However, energy efficiency of FPGAs has become a hard barrier preventing their expansion to more application contexts, due to two physical limitations: (1) The massive usage of routing multiplexers causes delay and power overheads as compared to ASICs. To reduce their power consumption, FPGAs have to operate at low supply voltage but sacrifice performance because the transistors drive degrade when working voltage decreases. (2) Using volatile memory technology forces FPGAs to lose configurations when powered off and to be reconfigured at each power on. Resistive Random Access Memories (RRAMs) have strong potentials in overcoming the physical limitations of conventional FPGAs. First of all, RRAMs grant FPGAs non-volatility, enabling FPGAs to be "Normally powered off, Instantly powered on". Second, by combining functionality of memory and pass-gate logic in one unique device, RRAMs can greatly reduce area and delay of routing elements. Third, when RRAMs are embedded into datpaths, the performance of circuits can be independent from their working voltage, beyond the limitations of CMOS circuits. However, researches and development of RRAM-based FPGAs are in their infancy. Most of area and performance predictions were achieved without solid circuit-level simulations and sophisticated Computer Aided Design (CAD) tools, causing the predicted improvements to be less convincing. In this thesis,we present high-performance and low-power RRAM-based FPGAs fromtransistorlevel circuit designs to architecture-level optimizations and CAD tools, using theoretical analysis, industrial electrical simulators and novel CAD tools. We believe that this is the first systematic study in the field, covering: From a circuit design perspective, we propose efficient RRAM-based programming circuits and routing multiplexers through both theoretical analysis and electrical simulations. The proposed 4T(ransitor)1R(RAM) programming structure demonstrates significant improvements in programming current, when compared to most popular 2T1R programming structure. 4T1R-based routingmultiplexer designs are proposed by considering various physical design parasitics, such as intrinsic capacitance of RRAMs and wells doping organization. The proposed 4T1R-based multiplexers outperformbest CMOS implementations significantly in area, delay and power at both nominal and near-Vt regime. From a CAD perspective, we develop a generic FPGA architecture exploration tool, FPGASPICE, modeling a full FPGA fabric with SPICE and Verilog netlists. FPGA-SPICE provides different levels of testbenches and techniques to split large SPICE netlists, in order to obtain better trade-off between simulation time and accuracy. FPGA-SPICE can capture area and power characteristics of SRAM-based and RRAM-based FPGAs more accurately than the currently best analyticalmodels. From an architecture perspective, we propose architecture-level optimizations for RRAMbased FPGAs and quantify their minimumrequirements for RRAM devices. Compared to the best SRAM-based FPGAs, an optimized RRAM-based FPGA architecture brings significant reduction in area, delay and power respectively. In particular, RRAM-based FPGAs operating in the near-Vt regime demonstrate a 5x power improvement without delay overhead as compared to optimized SRAM-based FPGA operating at nominal working voltage

    Exploring the Impact of Ions on Oxygen K-Edge X-ray Absorption Spectroscopy in NaCl Solution using the GW-Bethe-Salpeter-Equation Approach

    Full text link
    X-ray absorption spectroscopy (XAS) is a powerful experimental tool to probe the local structure in materials with the core hole excitations. Here, the oxygen K-edge XAS spectra of the NaCl solution and pure water are computed by using a recently developed GW-BSE approach, based on configurations modeled by path-integral molecular dynamics with the deep-learning technique. The neural network is trained on ab initio data obtained with SCAN density functional theory. The observed changes in the XAS features of the NaCl solution, compared to those of pure water, are in good agreement between experimental and theoretical results. We provided detailed explanations for these spectral changes that occur when NaCl is solvated in pure water. Specifically, the presence of solvating ion pairs leads to localization of electron-hole excitons. Our theoretical XAS results support the theory that the effects of the solvating ions on the H-bond network are mainly confined within the first hydration shell of ions, however beyond the shell the arrangement of water molecules remains to be comparable to that observed in pure water.Comment: 18 pages, 4 figure

    Accurate Power Analysis for Near-Vt RRAM-based FPGA

    Get PDF
    Resistive Random Access Memory (RRAM)-based FPGA architectures employ RRAMs not only as memories to store the configuration but embed them in the datapaths of programmable routing resources to propagate signals with improved performances. Sources of power consumption have been intensively studied for conventional Static Random Access Memories (SRAM)-based FPGAs. However, very limited works focused so far on studying the power characteristics of RRAM-based FPGAs. In this paper, we first analyze the power characteristics of RRAM-based multiplexer at circuit level and then use electrical simulations to study power consumption of RRAM-based FPGA architectures. Experimental results show that RRAM-based FPGAs achieve a Power-Delay Product reduced by 50% compared to SRAM-based FPGA at nominal voltage and 20% compared to near-Vt SRAM-based FPGA, respectively

    Novel Configurable Logic Block Architecture Exploiting Controllable-Polarity Transistors, invited paper

    Get PDF
    Controllable-polarity transistors exhibit a device-level configurability. Indeed, they can be dynamically configured between n- type and p-type. Such property can be exploited in Field Programmable Gate Arrays (FPGAs) to replace traditional Look-Up Tables (LUTs) by more powerful configurable units. We report here on a new FPGA logic block architecture, called MCluster, that takes a direct advantage of configurable transistors. The performance of the approach is evaluated and compared to its traditional Complementary Metal-Oxide- Semiconductor (CMOS) counterpart at 22-nm technology node. We note an average saving of 64% in areaĂ—delayĂ—power product

    Convert widespread paraelectric perovskite to ferroelectrics

    Full text link
    While nature provides a plethora of perovskite materials, only a few exhibits large ferroelectricity and possibly multiferroicity. The majority of perovskite materials have the non-polar CaTiO3_3(CTO)structure, limiting the scope of their applications. Based on effective Hamiltonian model as well as first-principles calculations, we propose a general thin-film design method to stabilize the functional BiFeO3_3(BFO)-type structure, which is a common metastable structure in widespread CaTiO3_3-type perovskite oxides. It is found that the improper antiferroelectricity in CTO-type perovskite and ferroelectricity in BFO-type perovskite have distinct dependences on mechanical and electric boundary conditions, both of which involve oxygen octahedral rotation and tilt. The above difference can be used to stabilize the highly polar BFO-type structure in many CTO-type perovskite materials

    Pattern-Based FPGA Logic Block and Clustering Algorithm

    Get PDF
    In classical FPGA, LUTs and DFFs are pre-packed into BLEs and then BLEs are grouped into logic blocks. We propose a novel logic block architecture with fast combinational paths between LUTs, called pattern-based logic blocks. A new clustering algorithm is developed to release the potential of pattern-based logic blocks. Experimental results show that the novel architecture and the associated clustering algorithm lead to a 14% performance gain and a 8% wirelength reduction with a 3% area overhead compared to conventional architecture in large control-instensive benchmarks

    Optimization Opportunities in RRAM-based FPGA Architectures

    Get PDF
    Static Random Access Memory (SRAM)-based routing multiplexers, whatever structure is employed, share a common limitation: their area, delay and power increase linearly with the input size. This property results in most SRAM-based FPGA architectures typically avoiding the use of large multiplexers. Resistive Random Access Memory (RRAM)- based multiplexers, built with one-level structure, have a unique advantage over SRAM-based multiplexers: their ideal delay is independent from the input size. This property allows RRAM-based FPGA architectures to use larger multiplexers than their SRAM-based counterparts, without generating any delay overhead. In this paper, by carefully considering the properties of RRAM multiplexers, we assess that current state-of-art architectural parameters for SRAM-based FPGAs cannot preserve optimality in the context of RRAM-based FPGAs. As a result, we propose that in RRAM-based FPGAs, (a) the routing tracks should be interconnected to Look-Up Table (LUT) inputs via a one-level crossbar, instead of through Connection Blocks and local routing; (b) the Switch Blocks should employ larger multiplexers; (c) length-2 wires should be used instead of length-4 wires. When operated in nominal voltage, the proposed RRAM-based FPGA architecture reduces area by 26%, delay by 39% and channel width by 13%, as compared to a SRAM-based FPGA with a classical architecture. When operated in the near-Vt regime, the proposed RRAM-based FPGA architecture improves Area-Delay Product by 42% and Power-Delay Product by 5x as compared to a classical SRAM-based FPGA at nominal voltage

    A Full-Capacity Local Routing Architecture for FPGAs

    Get PDF
    Reconfigurable systems employ highly-routable local routing architecture to interconnect generic fine-grain logic blocks. Commercial FPGAs employ 50% sparse crossbars rather than fully-connected crossbars in their local routing architecture to trade off between the area and routability of the Logic Blocks (LBs). While the input crossbar provides good routability and logic equivalence for the inputs of the LB, the outputs of the LBs are typically assigned to a physical location. This lack of flexibility brings strong constraints to the global net router. Here, we propose a novel local routing architecture that guarantees full logic equivalence on all input and output pins of the LBs. First, we introduce full-capacity crossbars to interconnect the outputs of the fine-grain Logic Elements (LEs) to the output pins of the LBs. Second, in the local routing, we use a combination of fully- connected and full-capacity crossbars. The full-capacity crossbars are used for the feedback connections in place of the standard fully-connected crossbars to ensure a full routability while reducing the area footprint. Fully-connected crossbars are still employed for the input connections to maintain the logic equivalence of the inputs. As a result, the novel local routing architecture enhances the routability of the LB clusters without any area overhead. By granting the outputs with logic equivalence, the proposed local routing architecture unlocks the full optimization potential of FPGA routers. Architectural simulations show that without any modification on Verilog-to- Routing (VTR) tool suites, when a commercial FPGA architecture is considered and over a wide set of benchmarks, the novel local routing architecture can reduce 10% channel width and 11% routing area with 10% less areaĂ—delayĂ—power on average. Therefore, the novel local routing architecture enhances the routability of FPGA, and brings opportunities in realizing larger implementations on a single FPGA chip
    corecore